NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Memory Management in Complex Join Queries: A Re-evaluation Study

Jahangiri, S; Carey, M; Freytag, C (November 2024, 2024 ACM Symposium on Cloud Computing (SoCC'24))

Efficient multi-join query processing is crucial but remains a com- plex, ongoing challenge for high-performance data management systems (DBMSs). This paper studies the impact of different memory distribution techniques among join operators on different classes of multi-join query plans under different assumptions regarding memory availability and storage devices such as HDD and SSD on Amazon Web Services (AWS). We re-evaluate the results of one of the early impactful studies from the 1990s that was originally done using a simulator for the Gamma database system. The main goal of our study is to scientifically re-evaluate and build upon previous studies whose results have become the basis for the design of past and modern database systems, and to provide a solid foundation for understanding basic “join physics", which is essential for eventually designing a resource-based scheduler for concurrent complex workloads.
more » « less
Full Text Available
Multi-Valued Indexing in AsterixDB

Galviso, G.; Carey, M. (March 2022, Proc. Int’l. Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP))

Full Text Available
On Multi-Valued Indexing in AsterixDB

Galviso, G.; Carey, M. (March 2022, Int’l. Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP 2022), co-located with EDBT 2022)
Stefanidis, K.; Golab, L. (Ed.)
Secondary indexes in relational database systems are traditionally built under the assumption that one data record maps to one indexed value. Nowadays, particularly in NoSQL systems, single data records can hold collections of values that users want to access efficiently in an ad-hoc manner. Multi-valued indexes aim to give users the best of both worlds: (i) to keep a more natural data model of records with collections of values, and (ii) to reap the benefits of a secondary index. In this paper, we detail the steps taken to realize multi-valued indexes in AsterixDB, a Big Data management system with a structured query language operating over a collection of docu- ments. This includes (a) creating the specification language for such indexes, (b) illustrating data flows for bulk-loading and maintaining an index, and (c) discussing query plans to take advantage of multi-valued indexes for use in predicates with existential and universal quantification. We conclude with ex- periments that compare AsterixDB multi-valued indexes against similar indexes in MongoDB and Couchbase Query.
more » « less
Full Text Available
Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management Systems

https://doi.org/10.5441/002/edbt.2022.01

Pavlopoulou, C.; Carey, M.; Tsotras, V. (March 2022, Proc. EDBT Conf.)

Full Text Available
Columnar Formats for Schemaless LSM-based Document Stores

https://doi.org/10.14778/3547305.3547314

Alkowaileet, W.; Carey, M. (January 2022, Proceedings of the VLDB Endowment)

Full Text Available
Design Trade-offs for a Robust Dynamic Hybrid Hash Join

https://doi.org/10.14778/3547305.3547327

Jahangiri, S.; Carey, M.; Freytag, C. (January 2022, Proceedings of the VLDB Endowment)

Full Text Available
CH2: A Hybrid Operational/Analytical Processing Benchmark for NoSQL

https://doi.org/10.1007/978-3-030-94437-7_5

Carey, M.; Lychagin, D.; Muralikrishna, M.; Sarawathy, V; Westmann, T. (January 2022, Proc. 13th TPC Technology Conf. on Performance Evaluation & Benchmarking (TPC TC))
Nambiar, R; Poess, M. (Ed.)
Database systems with hybrid data management support, referred to as HTAP or HOAP architectures, are gaining popularity. These first appeared in the relational world, and the CH-benCHmark (CH) was proposed in 2011 to evaluate such relational systems. Today, one finds NoSQL database systems gaining adoption for new applications. In this paper we present CH2, a new benchmark – created with CH as its starting point – aimed at evaluating hybrid data platforms in the document data management world. Like CH, CH2 borrows from and extends both TPC-C and TPC-H. Differences from CH include a document-oriented schema, a data generation scheme that creates a TPC-H-like history, and a “do over” of the CH queries that is more in line with TPC-H. This paper details shortcomings that we uncovered in CH, the design of CH2, and preliminary results from running CH2 against Couchbase Server 7.0 (whose Query and Analytics services provide HOAP support for NoSQL data). The results provide insight into the performance isolation and horizontal scalability properties of Couchbase Server 7.0 as well as demonstrating the efficacy of CH2 for evaluating such platforms.
more » « less
Full Text Available
PolyFrame: A Retargetable Query-based Approach to Scaling Dataframes

Sinthong, P.; Carey, M. (July 2021, Proceedings of the VLDB Endowment)
null (Ed.)
Full Text Available
Scale-independent Data Analysis with Database-backed Dataframes: a Case Study

Sinthong, P.; Carey, M.; Yao, Y. (March 2021, Proceedings of the 1st Int'l Workshop on Data Analytics and Machine Learning Made Simple (SIMPLIFY 2021))
null (Ed.)
Full Text Available
SmartBench: A Benchmark For Data Management In Smart Spaces

https://doi.org/10.14778/3407790.3407791

Gupta, P.; Carey, M.; Mehrotra, S.; Yus, R. (July 2020, Proceedings of the VLDB Endowment)

This paper proposes SmartBench, a benchmark focusing on queries resulting from (near) real-time applications and longer-term analysis of IoT data. SmartBench, derived from a deployed smart building monitoring system, is comprised of: 1) An extensible schema that captures the fundamentals of an IoT smart space; 2) A set of representative queries focusing on analytical tasks; and 3) A data generation tool that generates large amounts of synthetic sensor and semantic data based on seed data collected from a real system. We present an evaluation of seven representative database sys- tems and highlight some interesting findings that can be considered when deciding what database technologies to use under different types of IoT query workloads.
more » « less
Full Text Available

« Prev Next »

Search for: All records